pacman::p_load(sf, tidyverse, funModeling, tmap, maptools, raster, spatstat, dplyr,ggplot2, tmap, RColorBrewer)Take-home_Ex01
Getting started - Load R packages
Importing GeoSpatial Dataset
NGA <- st_read("data/geospatial/",
layer = "nga_admbnda_adm2") %>%
st_transform(crs = 26392)Reading layer `nga_admbnda_adm2' from data source
`C:\yifei-alpaca\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
Instead of loading just Osun state, it is good to keep a raw data above in case we need to roll back or for checking purposes.
NGA_osun <- st_read("data/geospatial/",
layer = "nga_admbnda_adm2") %>%
st_transform(crs = 26392) %>% filter(`ADM1_EN` == "Osun")Reading layer `nga_admbnda_adm2' from data source
`C:\yifei-alpaca\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
Importing Aspatial dataset
wp_nga <- read_csv("data/aspatial/WPdx.csv")Filter the raw dataset to only in Osun State, Nigeria.
Instead of loading just Osun state, it is good to keep a raw data above incase we need to roll back or for checking purposes. We will be using wp_osun_nga dataframe.
wp_osun_nga <- read_csv("data/aspatial/WPdx.csv") %>%
filter(`#clean_country_name` == "Nigeria" & `#clean_adm1` == "Osun")Checking the content of a simple feature data frame
st_geometry(NGA_osun)Geometry set for 30 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 176503.2 ymin: 331434.7 xmax: 291043.8 ymax: 454520.1
Projected CRS: Minna / Nigeria Mid Belt
First 5 geometries:
Converting (aspatial data) into sf point features
We need to convert into sfc field by using st_as_sfc data type.
wp_osun_nga$Geometry = st_as_sfc(wp_osun_nga$`New Georeferenced Column`)
wp_osun_nga# A tibble: 5,557 × 71
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 429123 GRID3 8.02 5.06 08/29/… Unknown <NA> <NA> Tapsta…
2 70566 Federal Minis… 7.32 4.79 05/11/… No Protec… Well Mechan…
3 70578 Federal Minis… 7.76 4.56 05/11/… No Boreho… Well Mechan…
4 66401 Federal Minis… 8.03 4.64 04/30/… No Boreho… Well Mechan…
5 422190 GRID3 7.87 4.88 08/29/… Unknown <NA> <NA> Tapsta…
6 422064 GRID3 7.7 4.89 08/29/… Unknown <NA> <NA> Tapsta…
7 65607 Federal Minis… 7.89 4.71 05/12/… No Boreho… Well Mechan…
8 68989 Federal Minis… 7.51 4.27 05/07/… No Boreho… Well <NA>
9 67708 Federal Minis… 7.48 4.35 04/29/… Yes Boreho… Well Mechan…
10 66419 Federal Minis… 7.63 4.50 05/08/… Yes Boreho… Well Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
Next, we will convert the dataframe into an sf object by using st_sf(). It is important to transform the referencing system of the data into the sf format. Next we have to transform Nigeria projected coordinate system.
wp_sf_osun <- st_sf(wp_osun_nga, crs=4326) %>%
st_transform(crs = 26392)
wp_sf_osunSimple feature collection with 5557 features and 70 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 177285.9 ymin: 340054.1 xmax: 291287.1 ymax: 450859.7
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 5,557 × 71
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
* <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 429123 GRID3 8.02 5.06 08/29/… Unknown <NA> <NA> Tapsta…
2 70566 Federal Minis… 7.32 4.79 05/11/… No Protec… Well Mechan…
3 70578 Federal Minis… 7.76 4.56 05/11/… No Boreho… Well Mechan…
4 66401 Federal Minis… 8.03 4.64 04/30/… No Boreho… Well Mechan…
5 422190 GRID3 7.87 4.88 08/29/… Unknown <NA> <NA> Tapsta…
6 422064 GRID3 7.7 4.89 08/29/… Unknown <NA> <NA> Tapsta…
7 65607 Federal Minis… 7.89 4.71 05/12/… No Boreho… Well Mechan…
8 68989 Federal Minis… 7.51 4.27 05/07/… No Boreho… Well <NA>
9 67708 Federal Minis… 7.48 4.35 04/29/… Yes Boreho… Well Mechan…
10 66419 Federal Minis… 7.63 4.50 05/08/… Yes Boreho… Well Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
we can check the transformed projected system
st_crs(wp_sf_osun)Coordinate Reference System:
User input: EPSG:26392
wkt:
PROJCRS["Minna / Nigeria Mid Belt",
BASEGEOGCRS["Minna",
DATUM["Minna",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4263]],
CONVERSION["Nigeria Mid Belt",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",8.5,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.99975,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",670553.98,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Nigeria between 6°30'E and 10°30'E, onshore and offshore shelf."],
BBOX[3.57,6.5,13.53,10.51]],
ID["EPSG",26392]]
Data cleaning
- Remove unwanted columns
- As the data set contains many redundant fields, we will be selecting the columns that we want.
- I have included SD_EN for further analysis purposes.
keeps <- c(3:4, 8:9, 15:16)
NGA_osun = NGA_osun[keeps]keeps2 <- c(1,2,7:9,11:14,17,21,22,26)
wp_sf_osun = wp_sf_osun[keeps2]Check for duplicates for NGA_osun
NGA_osun$ADM2_EN[duplicated(NGA_osun$ADM2_EN)==TRUE]character(0)
In this case, there is no duplicate names, we do not need to transform any names. We can move on to the next phase which is the EDA.
Exploratory data analysis and further cleaning
In our in class exe02, we uses ggplot() of ggplot package to reveal the distribution of water point status visually.
*Note: currently my freq() for funModeling package is not working properly, hence, I’m using alternative method to plot the horizontal barchart.
ggplot(wp_sf_osun) + geom_bar(aes(y = fct_infreq(`#status_clean`), fill=`#status_clean`, stat="identity"))+ theme(legend.position = "right")
As we can see from the images, there are category which can be group together such as “Abandoned” can combine with “Abandoned/Decommissioned”. As we can see that there are quite a number of missing values. Hence, we should rename to “Unknown”.
wp_sf_osun <- wp_sf_osun %>%
rename(status_clean = '#status_clean') %>%
mutate(status_clean = replace_na(
status_clean, "Unknown"))wp_sf_osun$status_clean[wp_sf_osun$status_clean=="Abandoned"] <- "Abandoned/Decommissioned" We can plot the freq distribution diagram again.
#freq(data = wp_sf_osun,
#input = 'status_clean')
ggplot(wp_sf_osun) + geom_bar(aes(y = fct_infreq(`status_clean`), fill=`status_clean`, stat="identity"))+ theme(legend.position = "right")
Based on the image above, about to 42% of the water point are Functional, followed by 4.5% needs repair and about 1.2% are not in use. Whereas for Non-Functional, it is about 36.2% and 2.8% are non functional due to the dry season. The unknown status is at about 13.5% which is quite a large %. If let say, most of the unknown falls under non-functional, we can say that in Osun state close to 50% of the water points are not very clean and further work improvement needs to perform in the future. One of the reason for being “unknown” could be the area have not been accessed yet.
In order to have a better view, we can plot a chart based on the the (adm2) which is the Secondary Administrative Division.
ggplot(wp_sf_osun, aes(x = `#clean_adm2`, fill = status_clean)) +
geom_bar() +
coord_flip() +
theme_minimal() +
xlab("Secondary Admin Division") +
ylab("Number of status") +
scale_fill_brewer(type = "qual", palette = 3, name = "Status")
As we can see from the image above, Aiyedade has the highest number of status record ( 455 est.) as compared to others. 2nd in line would be Ejigbo (440 est.). The lowest status recorded would be Ife East with a record of 20 est.
By plotting based on the Secondary Admin Division, it is hard to interpret the area of interest e.g located in the north, south, east or west area. Notice that for NGA_osun dataframe, I have included SD_EN which shows the results of East, West and Central. Whereas for wp_sf_osun, there is no SD_EN which shows East, West or Central.
In this section, I would combine both dataset and try to find some useful insights. As mentioned by Prof Kam during in-class 05, we use cbind when the dataset has no unique identifier and it also must have the same number of row. But, in this case, I am able to use relational join even though there is no exact unique identifier but we can join on the Secondary Admin Division as it exists in both data frame. But before joining, we need to make sure that the column name is the same. We can change it to ADM2_EN for water point.
names(wp_sf_osun)[9] <- "ADM2_EN"Next, we can proceed to joining the data. But before that, lets deactivate the geometry in this new data frame for the purpose of this EDA analysis. When I first left join, some rows shows NA values, I went to cross check and notice the upper and lower case difference. With that, let’s change all names to lower case instead.
wp_sf_osun<- wp_sf_osun %>%
mutate(ADM2_EN = tolower(ADM2_EN)) NGA_osun<- NGA_osun %>%
mutate(ADM2_EN = tolower(ADM2_EN)) NGA_WP <- left_join(wp_sf_osun %>% as.data.frame(), NGA_osun %>% as.data.frame(), by = "ADM2_EN")After joining, I still have NA rows. I went to cross check and realized the difference. The spelling and “-”.
NGA_osun: aiyedire, ola-oluwa
wp_sf_osun: ayedire , ola oluwa
In the situation above, I will replace ayedire to aiyedire and ola oluwa to ola-oluwa.
wp_sf_osun$ADM2_EN <- str_replace(wp_sf_osun$ADM2_EN,"ayedire", "aiyedire")wp_sf_osun$ADM2_EN <- str_replace(wp_sf_osun$ADM2_EN,"ola oluwa", "ola-oluwa")Next, re join the 2 data frame again.
NGA_WP <- left_join(wp_sf_osun %>% as.data.frame(), NGA_osun %>% as.data.frame(), by = "ADM2_EN")Lets do a quick check if there is any NA in the column in NGA_WP data frame.
sum(is.na(NGA_WP$SD_EN))[1] 0
yay! there is no NA values in that column. Now we can plot a graph based on SD_EN column to have a clearer view of the region.
ggplot(NGA_WP, aes(x = SD_EN, fill = status_clean)) +
geom_bar() +
coord_flip() +
theme_minimal() +
xlab("SD_EN") +
ylab("Number of status") +
scale_fill_brewer(type = "qual", palette = 2, name = "Status")
As we can see from the chart above, Osun west has the most number of status being recorded. It also can the most functional water point as compared to the other 2 region. As for Non functional, Osun East has the most number being recorded.
Extracting water point data
Functional
wp_functional <- wp_sf_osun %>%
filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))Non Functional
wp_nonfunctional <- wp_sf_osun %>%
filter(status_clean %in%
c("Abandoned/Decommissioned",
"Non-Functional due to dry season",
"Non-Functional"))Unknown
wp_unknown <- wp_sf_osun %>%
filter(status_clean == "Unknown")Performing Point in Polygon count
Next, we want to find out the number of total, functional, nonfunctional and unknown water points in each Secondary Division.
NGA_wp_SubDiv <- NGA_osun %>%
mutate(`total_wp` = lengths(
st_intersects(NGA_osun, wp_sf_osun))) %>%
mutate(`wp_functional` = lengths(
st_intersects(NGA_osun, wp_functional))) %>%
mutate(`wp_nonfunctional` = lengths(
st_intersects(NGA_osun, wp_nonfunctional))) %>%
mutate(`wp_unknown` = lengths(
st_intersects(NGA_osun, wp_unknown)))Choropleth Mapping
functional_choroplot <- tm_shape(NGA_wp_SubDiv) +
tm_fill("wp_functional",
n = 5,
style = "equal",
palette = "Reds") +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_text("ADM2_EN", size = .7) +
tm_layout(main.title = "Distribution of functional water point by Sub Division",
legend.outside = TRUE)nonfunctional_choroplot <- tm_shape(NGA_wp_SubDiv) +
tm_fill("wp_nonfunctional",
n = 5,
style = "equal",
palette = "Reds") +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_text("ADM2_EN", size = .7) +
tm_layout(main.title = "Distribution of non-functional water point by Sub Division",
legend.outside = TRUE)functional_choroplot
nonfunctional_choroplot
tmap_mode('plot')Choropleth map group by Secondary Division
Functional
tm_shape(NGA_wp_SubDiv) +
tm_fill("wp_functional",
style = "equal",
palette = "Reds",
thres.poly = 0) +
tm_text("ADM2_EN", size = .7) +
tm_facets(by="SD_EN",
free.coords=TRUE,
drop.shapes=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
Non-Functional
tm_shape(NGA_wp_SubDiv) +
tm_fill("wp_nonfunctional",
style = "equal",
palette = "Reds",
thres.poly = 0) +
tm_text("ADM2_EN", size = .7) +
tm_facets(by="SD_EN",
free.coords=TRUE,
drop.shapes=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
Based on image above, we can see that for ejigbo has the highest distribution in functional water point. Whereas for non functional, ejigbo and aiyedade has the highest distribution in non functional water point.
However, We know that water points are not equally distributed in space. I will tabulate the proportion of functional water points and the proportion of non-functional water points in secondary division.
NGA_wp_SubDiv <- NGA_wp_SubDiv %>%
mutate(pct_functional = wp_functional/total_wp) %>%
mutate(pct_nonfunctional = wp_nonfunctional/total_wp)Plotting the map of the rate
functional_choroplot <- tm_shape(NGA_wp_SubDiv) +
tm_fill("pct_functional",
n = 5,
style = "equal",
palette = "Purples") +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Rate map of functional water point by Sub Division",
legend.outside = TRUE)nonfunctional_choroplot <- tm_shape(NGA_wp_SubDiv) +
tm_fill("pct_nonfunctional",
n = 5,
style = "equal",
palette = "Purples") +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Rate map of non-functional water point by Sub Division",
legend.outside = TRUE)tmap_arrange(functional_choroplot, nonfunctional_choroplot, nrow = 2)
Reasoning for EDA, right now, we have a clear background of osun after eda, next we will proceed with.
Population Analysis (additional findings)
In this section, I have found Nigeria’s Subnation population statistic in this website NGA_Subnation Population statistic. We can download nga_admpop_adm2_2020.csv.
In this data set, it is categorized by demographics such as age and sex on a Nigeria administrative level 0-2 . The purpose of this is to help me to identify which area with the most population are affected by water shortages and poor water quality.
Import aspatial data
nga_osun_pop <- read_csv("data/aspatial/nga_admpop_adm2_2020.csv") %>%
filter( `ADM1_NAME` == "OSUN")After looking at the data, we can keep T_TL as it is refer to the total population. But before that, we can drop unwanted columns to save memory space and clean the data.
keeps_pop <- c(1:9)
nga_osun_pop = nga_osun_pop[keeps_pop]nga_osun_pop<- nga_osun_pop %>%
mutate(ADM2_NAME = tolower(ADM2_NAME)) Next we can do some basic EDA to understand the data. Because there is some limit in the colors, we have to extend the colors by using colorRampPalette() function.
Population Bar chat
colourCount = length(unique(nga_osun_pop$ADM2_NAME))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))
ggplot(nga_osun_pop, aes(x=ADM2_NAME, y=T_TL, fill=ADM2_NAME )) +
geom_bar(stat = "identity") +
coord_flip() +
xlab("ADM2_NAME") +
ylab("Number of population") +
geom_text(aes(label=T_TL), color="black", size=2.5)+
scale_fill_manual(values = getPalette(colourCount))
As we can see from the chart above ife east and iwo has the most population with 312,801 and 302,585 respectively.
Next, we can plot population choropleth mapping, but before that, we should do a join with NGA_wp_SubDiv as we previously joined NGA and wp together to plot chorpleth charts. In this case, we have a unique subdivision name ADM2_NAME in nga_osun_pop and can join together with ADM2_EN in NGA_wp_SubDiv.
Change column name:
names(nga_osun_pop)[5] <- "ADM2_EN"NGA_wp_SubDiv_pop <- left_join(NGA_osun, nga_osun_pop,
by = c("ADM2_EN"))Distribution of population by Sub Division
As shown above, there is no Coordinate system. We have to re assign them.
tmap_mode("plot")
tm_shape(NGA_wp_SubDiv_pop)+
tm_fill("T_TL",
style = "quantile",
palette = "Oranges",
legend.show = FALSE,
) +
tm_text("ADM2_EN", size = .6) +
tm_layout(main.title = "Distribution of Population by Sub Division",
main.title.position = "center",
main.title.size = 0.8) +tm_borders(alpha = 0.5)
As we can see from the chart, ejigbo, iwo, irewole, ife central, ife east and oriade is much more dense as compared to the rest. This shows that the dense area has the most population. We will be continue with this data below.
Exploratory Spatial Data Analysis (ESDA)
Previously, we have converted into SF format which is wp_sf_osun. However, the data frame is for Aspatial data. As for Geospatial data, we have not yet converting it. We have to convert sf data frames to sp’s Spatial class. We will be using NGA_osun.
NGA_osun_s <- as_Spatial(NGA_osun)Check the description summary of the new sp class data.
NGA_osun_sclass : SpatialPolygonsDataFrame
features : 30
extent : 176503.2, 291043.8, 331434.7, 454520.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 6
names : ADM2_EN, ADM2_PCODE, ADM1_EN, ADM1_PCODE, SD_EN, SD_PCODE
min values : aiyedade, NG030001, Osun, NG030, Osun Central, NG03001
max values : osogbo, NG030030, Osun, NG030, Osun West, NG03003
NGA_osun_sp <- as(NGA_osun_s, "SpatialPolygons")NGA_osun_spclass : SpatialPolygons
features : 30
extent : 176503.2, 291043.8, 331434.7, 454520.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
Next, we will be converting water point data frame into ppp format. But first, we have to convert it from sf to sp. Do note that, we have transformed the coordinate system previously hence, we do not need to transform again.
Functional and Non-Functional
any variable with _f means it is Functional
any variable with _nf means it is Non-Functional
wp_osun_f <- as_Spatial(wp_functional)wp_osun_nf <- as_Spatial(wp_nonfunctional)wp_osun_fclass : SpatialPointsDataFrame
features : 2630
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 13
names : row_id, X.source, X.water_source_clean, X.water_source_category, X.water_tech_clean, X.facility_type, X.clean_country_name, X.clean_adm1, ADM2_EN, X.install_year, X.management_clean, status_clean, X.subjective_quality
min values : 36914, Federal Ministry of Water Resources, Nigeria, Borehole, Spring, Hand Pump, Improved, Nigeria, Osun, aiyedade, 1917, Community Management, Functional, Acceptable quality
max values : 471319, GRID3, Protected Spring, Well, Mechanized Pump - Solar, Improved, Nigeria, Osun, osogbo, 2015, School Management, Functional but not in use, No because of Taste
wp_osun_nfclass : SpatialPointsDataFrame
features : 2179
extent : 180539, 290616, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 13
names : row_id, X.source, X.water_source_clean, X.water_source_category, X.water_tech_clean, X.facility_type, X.clean_country_name, X.clean_adm1, ADM2_EN, X.install_year, X.management_clean, status_clean, X.subjective_quality
min values : 34829, Federal Ministry of Water Resources, Nigeria, Borehole, Spring, Hand Pump, Improved, Nigeria, Osun, aiyedade, 1967, Community Management, Abandoned/Decommissioned, Acceptable quality
max values : 421239, GRID3, Protected Spring, Well, Mechanized Pump - Solar, Improved, Nigeria, Osun, osogbo, 2015, School Management, Non-Functional due to dry season, No because of Taste
Next, convert sf to generic sp format of the water point.
wp_sp_osun_f <- as(wp_osun_f, "SpatialPoints")wp_sp_osun_nf <- as(wp_osun_nf, "SpatialPoints")wp_sp_osun_fclass : SpatialPoints
features : 2630
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
wp_sp_osun_nfclass : SpatialPoints
features : 2179
extent : 180539, 290616, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
Next, we will use as.ppp() function of spatstat to convert the spatial data into spatstat’s ppp object format.
wp_ppp_osun_f <- as(wp_sp_osun_f, "ppp")
wp_ppp_osun_fPlanar point pattern: 2630 points
window: rectangle = [177285.9, 290750.96] x [343128.1, 450859.7] units
wp_ppp_osun_nf <- as(wp_sp_osun_nf, "ppp")
wp_ppp_osun_nfPlanar point pattern: 2179 points
window: rectangle = [180538.96, 290616] x [340054.1, 450780.1] units
summary(wp_ppp_osun_f)Planar point pattern: 2630 points
Average intensity 2.151545e-07 points per square unit
Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units
Window: rectangle = [177285.9, 290750.96] x [343128.1, 450859.7] units
(113500 x 107700 units)
Window area = 12223800000 square units
summary(wp_ppp_osun_nf)Planar point pattern: 2179 points
Average intensity 1.787766e-07 points per square unit
Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units
Window: rectangle = [180538.96, 290616] x [340054.1, 450780.1] units
(110100 x 110700 units)
Window area = 12188400000 square units
Check for duplicates.
any(duplicated(wp_ppp_osun_f))[1] FALSE
any(duplicated(wp_ppp_osun_nf))[1] FALSE
yay! there is no duplicates.
We can plot the water point data by using the code below.
tmap_mode("view")
tm_shape(wp_osun_f) +
tm_dots(alph = 0.7,
size=0.01,
palette = "Dark2",
col="status_clean") +
tm_view(set.zoom.limits = c(8,11))tmap_mode("view")
tm_shape(wp_osun_nf) +
tm_dots(alph = 0.7,
size=0.01,
palette = "Dark2",
col="status_clean") +
tm_view(set.zoom.limits = c(8,11))Notice that there are some point that is outside of Osun, Nigeria area. This is because polygon data may contain geometrical inconsistencies such as self-intersections and overlaps. These inconsistencies must be removed to prevent problems in other spatstat functions. By default, polygon data will be repaired automatically using polygon-clipping code. The repair process may change the number of vertices in a polygon and the number of polygon components. With that, we will be using owin object in the next section to help us with that.
tmap_mode('plot')Create owin object
owin is specially designed to represent this polygonal region.
NGA_osun_owin <- as(NGA_osun_sp, "owin")plot(NGA_osun_owin)
Combining point events object and owin
wpNGA_owin_f = wp_ppp_osun_f[NGA_osun_owin]wpNGA_owin_nf = wp_ppp_osun_nf[NGA_osun_owin]plot(wpNGA_owin_f)
plot(wpNGA_owin_nf)
Based on the image above, there is a little cluttered in many different areas. However, it seems that there are more records at the centre & top area of Osun, Nigeria.
1st order Spatial Point Pattern Analysis
Kernel density estimation (KDE)
Kernel density estimation maps are considered best for location, size, shape and orientation of the hotspot.
In the hands-on excises, there are a few recommendation in using methods such as, bw.CvL(), bw.scott(), bw.ppl() or bw.diggle(). But before applying the methods, we learnt from our hands-on exe that we should rescale, so as to convert the meters to kilometers.
kde_wpNGA_owin_f_bw.km <- rescale(wpNGA_owin_f, 1000, "km")
kde_wpNGA_owin_nf_bw.km <- rescale(wpNGA_owin_nf, 1000, "km")In spatstat, the functions bw.diggle(), bw.ppl(), and bw.scott() can be used to estimate the bandwidth according to difference methods. These functions run algorithms that aim to select an appropriate bandwith.
Functional
bw.diggle(kde_wpNGA_owin_f_bw.km) sigma
0.2521687
bw.ppl(kde_wpNGA_owin_f_bw.km) sigma
0.9192953
bw.scott(kde_wpNGA_owin_f_bw.km) sigma.x sigma.y
6.447815 6.379868
Non-Functional
bw.diggle(kde_wpNGA_owin_nf_bw.km) sigma
0.3082061
bw.ppl(kde_wpNGA_owin_nf_bw.km) sigma
0.9737385
bw.scott(kde_wpNGA_owin_nf_bw.km) sigma.x sigma.y
6.338388 7.018473
As we can see the Diggle algorithm gives us a narrower bandwidth for both functional and non-functional water point. However, Baddeley et (2016) suggest the use of the bw.ppl() algorithm because in their experience it tends to produce the more appropriate values when the pattern consists predominantly of tight clusters.
Hence, in this take home, I decided to use ppl() for my analysis.
kde_wpNGA_owin_bw_f <- density(kde_wpNGA_owin_f_bw.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian") kde_wpNGA_owin_bw_nf <- density(kde_wpNGA_owin_nf_bw.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian") par(mfrow=c(1,2))
plot(kde_wpNGA_owin_bw_f)
plot(kde_wpNGA_owin_bw_nf)
Converting KDE output into grid object
Now, we have to convert our KDE outputs into RasterLayer objects. Since we can’t do that directly, we’ll need to convert them into a SpatialGridDataFrame first, then convert the SpatialGridDataFrame into RasterLayer objects:
gridded_kde_wpNGA_owin_bw_f <- as.SpatialGridDataFrame.im(kde_wpNGA_owin_bw_f)
spplot(gridded_kde_wpNGA_owin_bw_f)
gridded_kde_wpNGA_owin_bw_nf <- as.SpatialGridDataFrame.im(kde_wpNGA_owin_bw_nf)
spplot(gridded_kde_wpNGA_owin_bw_nf)
Converting gridded output into raster
Next, we will convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.
kde_wpNGA_owin_bw_raster_f <- raster(gridded_kde_wpNGA_owin_bw_f)kde_wpNGA_owin_bw_raster_nf <- raster(gridded_kde_wpNGA_owin_bw_nf)kde_wpNGA_owin_bw_raster_fclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -4.99773e-16, 10.55944 (min, max)
kde_wpNGA_owin_bw_raster_nfclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -2.52505e-16, 9.25861 (min, max)
Notice that the crs property is NA. We have to assign to appropriate projection system with the correct unit of measurement.
projection(kde_wpNGA_owin_bw_raster_f) <- CRS("+init=EPSG:26392 +units=km" )
kde_wpNGA_owin_bw_raster_fclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : +init=EPSG:26392 +units=km
source : memory
names : v
values : -4.99773e-16, 10.55944 (min, max)
projection(kde_wpNGA_owin_bw_raster_nf) <- CRS("+init=EPSG:26392 +units=km" )
kde_wpNGA_owin_bw_raster_nfclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : +init=EPSG:26392 +units=km
source : memory
names : v
values : -2.52505e-16, 9.25861 (min, max)
Now we can see the crs property completed.
Visualising the output on OpenStreetMap
density_map <- function(rasterObj, map_title) {
tm_basemap("OpenStreetMap") +
tm_shape(rasterObj) +
tm_raster("v", alpha=0.65) +
tm_layout(legend.position = c("right", "bottom"),
legend.height = 0.5,
legend.width = 0.4,
main.title = map_title,
main.title.position = 'center',
main.title.size = 1,
frame = FALSE)
}density_map(kde_wpNGA_owin_bw_raster_f, map_title = "Osun Functional Water Point Density Map")
density_map(kde_wpNGA_owin_bw_raster_nf, map_title = "Osun Non-Functional Water Point Density Map")
Comparing Spatial Point Pattern using KDE
Previously, we have included the region in the data set, in this section, we will be comparing KDE water point at Central, East and West region.
Extract the study area
oe = NGA_osun_s[NGA_osun_s$SD_EN == 'Osun East',]
ow = NGA_osun_s[NGA_osun_s$SD_EN == 'Osun West',]
oc = NGA_osun_s[NGA_osun_s$SD_EN == 'Osun Central',]par(mfrow=c(1,3))
plot(oc, main="Osun Central")
plot(oe, main="Osun East")
plot(ow, main="Osun West")
oe_sp = as(oe, "SpatialPolygons")
ow_sp = as(ow, "SpatialPolygons")
oc_sp = as(oc, "SpatialPolygons")oe_owin = as(oe_sp, "owin")
ow_owin = as(ow_sp, "owin")
oc_owin = as(oc_sp, "owin")Combining Water points and the study area
wp_oe_ppp_f = wp_ppp_osun_f[oe_owin]
wp_ow_ppp_f = wp_ppp_osun_f[ow_owin]
wp_oc_ppp_f = wp_ppp_osun_f[oc_owin]wp_oe_ppp_nf = wp_ppp_osun_nf[oe_owin]
wp_ow_ppp_nf = wp_ppp_osun_nf[ow_owin]
wp_oc_ppp_nf = wp_ppp_osun_nf[oc_owin]Next, rescale() function is used to transform the unit of measurement from m to km
wp_oe_ppp_f.km = rescale(wp_oe_ppp_f, 1000, "km")
wp_ow_ppp_f.km = rescale(wp_ow_ppp_f, 1000, "km")
wp_oc_ppp_f.km = rescale(wp_oc_ppp_f, 1000, "km")wp_oe_ppp_nf.km = rescale(wp_oe_ppp_nf, 1000, "km")
wp_ow_ppp_nf.km = rescale(wp_ow_ppp_nf, 1000, "km")
wp_oc_ppp_nf.km = rescale(wp_oc_ppp_nf, 1000, "km")plot the 3 study area and the location of the water point
par(mfrow=c(1,3))
plot(wp_oc_ppp_f.km, main="Osun Central Functional")
plot(wp_oe_ppp_f.km, main="Osun East Functional")
plot(wp_ow_ppp_f.km, main="Osun West Functional")
par(mfrow=c(1,3))
plot(wp_oc_ppp_nf.km, main="Osun Central Non Functional")
plot(wp_oe_ppp_nf.km, main="Osun East Non Functional")
plot(wp_ow_ppp_nf.km, main="Osun West Non Functional")
Computing KDE by Region
Over here, we will continue to use bw.ppl too so as to make a non bias analysis.
par(mfrow=c(1,3))
plot(density(wp_oc_ppp_f.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Osun Central Functional")
plot(density(wp_oe_ppp_f.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Osun East Functional")
plot(density(wp_ow_ppp_f.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Osun West Functional")
par(mfrow=c(1,3))
plot(density(wp_oc_ppp_nf.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Osun Central Non Functional")
plot(density(wp_oe_ppp_nf.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Osun East Non Functional")
plot(density(wp_ow_ppp_nf.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Osun West Non Functional")
Based on the 3 graph above, we can see that central area seems more concentrated. Although, it is a little hard to narrow down into the division. I will be extracting Osun’s population based on region to determine which area we should be analyse on.
tm_shape(NGA_wp_SubDiv_pop) +
tm_fill("T_TL",
style = "equal",
palette = "Oranges",
thres.poly = 0) +
tm_text("ADM2_EN", size = .7) +
tm_facets(by="SD_EN",
free.coords=TRUE,
drop.shapes=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
As mentioned above here, ejigbo, iwon, irewole, oriade, ife central and ife east is most dense and from the bar chart above here ife east and iwo has the most population with 312,801 and 302,585 respectively.
With the findings, I will be narrowing to the 2 most populated division for further analysis:
- ife east
- iwo
Extract the study area for ife east & iwo
ife_east = NGA_osun_s[NGA_osun_s$ADM2_EN == 'ife east',]
iwo = NGA_osun_s[NGA_osun_s$ADM2_EN == 'iwo',]ife_east_sp = as(ife_east, "SpatialPolygons")
iwo_sp = as(iwo, "SpatialPolygons")ife_east_owin = as(ife_east_sp, "owin")
iwo_owin = as(iwo_sp, "owin")wp_ife_east_ppp_f = wp_ppp_osun_f[ife_east_owin]
wp_iwo_ppp_f = wp_ppp_osun_f[iwo_owin]
wp_ife_east_ppp_nf = wp_ppp_osun_nf[ife_east_owin]
wp_iwo_ppp_nf = wp_ppp_osun_nf[iwo_owin]
wp_ife_east_ppp_f.km = rescale(wp_ife_east_ppp_f, 1000, "km")
wp_iwo_ppp_f.km = rescale(wp_iwo_ppp_f, 1000, "km")
wp_ife_east_ppp_nf.km = rescale(wp_ife_east_ppp_nf, 1000, "km")
wp_iwo_ppp_nf.km = rescale(wp_iwo_ppp_nf, 1000, "km")kde_ife_east_f <- density(wp_ife_east_ppp_f.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
kde_iwo_f <- density(wp_iwo_ppp_f.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")par(mfrow=c(1,2))
plot( kde_ife_east_f,
main="ife_east Functional")
plot( kde_iwo_f,
main="iwo Functional")
kde_ife_east_nf <- density(wp_ife_east_ppp_nf.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
kde_iwo_nf <- density(wp_iwo_ppp_nf.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")par(mfrow=c(1,2))
plot( kde_ife_east_nf,
main="ife_east Non Functional")
plot( kde_iwo_nf,
main="iwo Non Functional")
Converting KDE output into grid object
gridded_kde_ife_east_f<- as.SpatialGridDataFrame.im(kde_ife_east_f)
gridded_kde_iwo_f <- as.SpatialGridDataFrame.im(kde_iwo_f)
gridded_kde_ife_east_nf <- as.SpatialGridDataFrame.im(kde_ife_east_nf)
gridded_kde_iwo_nf <- as.SpatialGridDataFrame.im(kde_iwo_nf)par(mfrow=c(1,2))
spplot(gridded_kde_ife_east_f)
spplot(gridded_kde_iwo_f)
par(mfrow=c(1,2))
spplot(gridded_kde_ife_east_nf)
spplot(gridded_kde_iwo_nf)
Converting gridded output into raster
kde_ife_east_f_raster <- raster(gridded_kde_ife_east_f)
kde_iwo_f_raster <- raster(gridded_kde_iwo_f)
kde_ife_east_nf_raster <- raster(gridded_kde_ife_east_nf)
kde_iwo_nf_raster <- raster(gridded_kde_iwo_nf)projection(kde_ife_east_f_raster) <- CRS("+init=EPSG:26392 +units=km")
projection(kde_iwo_f_raster) <- CRS("+init=EPSG:26392 +units=km")
projection(kde_ife_east_nf_raster) <- CRS("+init=EPSG:26392 +units=km")
projection(kde_iwo_nf_raster) <- CRS("+init=EPSG:26392 +units=km")par(mfrow=c(2,4))
tm_shape(kde_ife_east_f_raster) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE,
main.title = "ife east Funtional",
main.title.position = "center",
main.title.size = 0.8)
tm_shape(kde_iwo_f_raster) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE,
main.title = "iwo Functional",
main.title.position = "center",
main.title.size = 0.8)
tm_shape(kde_ife_east_nf_raster) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE,
main.title = "ife east Non Functional",
main.title.position = "center",
main.title.size = 0.8)
tm_shape(kde_iwo_nf_raster) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE,
main.title = "iwo Non Functional",
main.title.position = "center",
main.title.size = 0.8)
Highlight the advantage of kernel density map over point map
The advantage of kernel density map over point map lies in the ability of the former to provide a smoother representation of the data distribution. Unlike point map which simply plots individual data points, the kernel density map uses a mathematical technique to estimate the underlying probability density function of the data and provides a smooth estimate of the data density over the entire region. This smoothed representation of the data gives a clearer picture of the underlying distribution pattern, making it easier to identify trends and anomalies. Additionally, the use of color gradients in kernel density maps allows for a clearer representation of the distribution range, making it easier to interpret the results.
2nd Order Spatial Point Pattern Analysis
Now that we have analysed the spatial point patterns, we have to confirm our observation statistically. I have narrowed down on my observation to ife east and iwo area which is the most populated area. The nearest-neighbour distance is the measure of distance from each point to its nearest neighbour. G-function measures the distribution of distances from an arbitrary event to its nearest neighbour.
Ife East Functional - G Function Gest()
H0: The distribution of the Functional water points at ife east are randomly distributed
H1: The distribution of the Functional water points at ife east are not randomly distributed
Confidence level at 95%
Significance level: 0.05
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05
G_ife_east_f = Gest(wp_ife_east_ppp_f, correction = "border")
plot(G_ife_east_f, xlim=c(0,500))
G_ife_east_f.csr <- envelope(wp_ife_east_ppp_f, Gest, nsim = 95)Generating 95 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95.
Done.
plot(G_ife_east_f.csr)
Conclusion: The observed G(r) is above the G(theo) as well as the envelope. This shows that functional water point in ife east are clustered. Hence, we will reject the null hypothesis that the functional water points at ife east are randomly distributed.
Iwo Functional - G Function
H0: The distribution of the Functional water points at iwo are randomly distributed
H1: The distribution of the Functional water points at iwo are not randomly distributed
Confidence level at 95%
Significance level: 0.05
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05
G_iwo_f = Gest(wp_iwo_ppp_f, correction = "border")
plot(G_iwo_f, xlim=c(0,500))
G_iwo_f.csr <- envelope(wp_iwo_ppp_f, Gest, nsim = 39)Generating 39 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
Done.
plot(G_iwo_f.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope. This shows that functional water point in iwo are clustered. Hence, we will reject the null hypothesis that the functional water points at iwo are randomly distributed at 95% confident interval.
Ife East Non Functional - G Function
H0: The distribution of the Non functional water points at ife east are randomly distributed
H1: The distribution of the Non Functional water points at ife east are not randomly distributed
Confidence level at 95%
Significance level: 0.05
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05
G_ife_east_nf = Gest(wp_ife_east_ppp_nf, correction = "border")
plot(G_ife_east_nf, xlim=c(0,500))
G_ife_east_nf.csr <- envelope(wp_ife_east_ppp_nf, Gest, nsim = 39)Generating 39 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
Done.
plot(G_ife_east_nf.csr)
Conclusion: The observed G(r) is above the G(theo) as well as the envelope. This shows that non functional water point in ife east are clustered. Hence, we will reject the null hypothesis that the non functional water points at ife east are randomly distributed.
Iwo Non Functional - G Function
H0: The distribution of the Non functional water points at iwo are randomly distributed
H1: The distribution of the Non Functional water points at iwo are not randomly distributed
Confidence level at 95%
Significance level: 0.05
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05
G_iwo_nf = Gest(wp_iwo_ppp_nf, correction = "border")
plot(G_iwo_nf, xlim=c(0,500))
G_iwo_nf.csr <- envelope(wp_iwo_ppp_nf, Gest, nsim = 39)Generating 39 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
Done.
plot(G_iwo_nf.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope. This shows that non functional water point in iwo are clustered. Hence, we will reject the null hypothesis that the functional water points at iwo are randomly distributed at 95% confident interval.
Spatial Correlation Analysis
Previously, we have visualize geospatial data on the different status here. Having to further dive in, we can look into the subjective quality in both functional and non functional.
Let’s take a look at quality column.
unique(wp_functional$`#subjective_quality`)[1] "Acceptable quality" "No because of Colour" "No because of Odour"
[4] "No because of Taste"
unique(wp_nonfunctional$`#subjective_quality`)[1] "Acceptable quality" "No because of Taste" "No because of Odour"
[4] "No because of Colour"
Based on the result, both functional and non functional has 4 unique values.
Let’s plot them to have a better idea.
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(wp_functional) +
tm_dots(col = '#subjective_quality', size = 0.02, title="Quality Type" ,alpha=0.6,
palette = c("#e76f51", "#e9c46a","#2a9d8f"))tm_basemap("OpenStreetMap") +
tm_shape(wp_nonfunctional) +
tm_dots(col = '#subjective_quality', size = 0.02, title="Quality Type" ,alpha=0.6,
palette = c("#e76f51", "#e9c46a","#2a9d8f"))tmap_mode("plot")As we are working with marked data, and we know that the values are categorical (different room types), we need to ensure that the marked field is of factor data type. However, as seen from the output, our room_type field is of chr data type, not factor! Let's rectify that with the as.factor() function:
wp_osun_f@data$`#subjective_quality` <-as.factor(wp_osun_f@data$`X.subjective_quality`)wp_osun_nf@data$`X.subjective_quality` <-as.factor(wp_osun_nf@data$`X.subjective_quality`)wp_functional_ppp_factor <- as(wp_osun_f, "ppp")
wp_nonfunctional_ppp_factor <- as(wp_osun_nf, "ppp") Correlation
In this section, I will be using ife east and iwo owin object. We will be looking at:
ife east quality of water in functional
ife east quality of water in non functional
iwo quality of water in functional
iwo quality of water in non functional
ife_east_marked_ppp <- wp_functional_ppp_factor[ife_east_owin]
iwo_marked_ppp <- wp_functional_ppp_factor[iwo_owin]ife_east_marked_ppp_nf <- wp_nonfunctional_ppp_factor[ife_east_owin]
iwo_marked_ppp_nf <- wp_nonfunctional_ppp_factor[iwo_owin]par(mfrow=c(1,2))
plot(ife_east_marked_ppp, main = "ife east F", which.marks = "#subjective_quality")
plot(ife_east_marked_ppp_nf, main = "ife east NF", which.marks = "X.subjective_quality")
par(mfrow=c(1,2))
plot(iwo_marked_ppp, main = "iwo F", which.marks = "#subjective_quality")
plot(iwo_marked_ppp_nf, main = "iwo NF", which.marks = "X.subjective_quality")
In this section, hypothesis testing will be conducted utilising second-order statistics (L function), to assess if the spatial distribution of functional and non-functional water points are independent from each other.
2nd order multi-type point pattern analysis - Cross L function
plot((density(split(rescale(ife_east_marked_ppp, 1000)))))
plot((density(split(rescale(ife_east_marked_ppp_nf, 1000)))))
Based on the graphs above, acceptable quality and cause of taste seems to have a stronger complementary relationship with each other in ife east area for both functional and non functional area. However, Acceptable quality being the most strong ones due to the scale.
plot((density(split(rescale(iwo_marked_ppp, 1000)))))
plot((density(split(rescale(iwo_marked_ppp_nf, 1000)))))
Based on the graphs above, for functional in iwo area, most of the charts has a stronger complementary relationship with each other except for odour. However, Acceptable quality being the most strong ones due to the scale. As for non functional, most of the charts has a strong complementary relationship too except for color chart, there is a slight difference on the area. However, Acceptable quality also being the most strong ones due to the scale.